convolutional tensor-train lstm
Convolutional Tensor-Train LSTM for Spatio-Temporal Learning
Learning from spatio-temporal data has numerous applications such as human-behavior analysis, object tracking, video compression, and physics simulation. However, existing methods still perform poorly on challenging video tasks such as long-term forecasting. This is because these kinds of challenging tasks require learning long-term spatio-temporal correlations in the video sequence. In this paper, we propose a higher-order convolutional LSTM model that can efficiently learn these correlations, along with a succinct representations of the history. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. To make this feasible in terms of computation and memory requirements, we propose a novel convolutional tensor-train decomposition of the higher-order model. This decomposition reduces the model complexity by jointly approximating a sequence of convolutional kernels as a low-rank tensor-train factorization. As a result, our model outperforms existing approaches, but uses only a fraction of parameters, including the baseline models. Our results achieve state-of-the-art performance in a wide range of applications and datasets, including the multi-steps video prediction on the Moving-MNIST-2 and KTH action datasets as well as early activity recognition on the Something-Something V2 dataset.
Review for NeurIPS paper: Convolutional Tensor-Train LSTM for Spatio-Temporal Learning
Weaknesses: The two major weaknesses are a lack of comparison to previous work by Yang et. Although, it does not rely on the same structure as this work (smooth evolution over time in video data vs tensor train), it does rely on somewhat of a similar structure (i.e. Looking at these two side by side, I appreciate their difference, however I think they're still too similar to not require a comparison. One could conceivably imagine that the same underlying structure is exploited by both approaches, which diminishes the novelty of the work. It remains to be seen whether this application of tensor train is orthogonal to the application of tensor train by Yang et.
Review for NeurIPS paper: Convolutional Tensor-Train LSTM for Spatio-Temporal Learning
This paper develops a higher-Markov-order convolutional LSTM based on tensor train decomposition, with applications to spatio-temporal activity analysis in videos. The reviews were mixed but marginally positive on average, and the scores increased slightly following the rebuttal and some discussion.There is a consensus that the approach is novel and interesting. The main criticism is that despite the extensive experiments it remains unclear whether it is novel formulation itself that is producing the observed improvements, or the many other points that differ relative to the baselines. The advantages of using Markov order 1 in this application also need to be clarified. Overall, the AC and SAC agreed that this was above threshold for NeurIPS.
Convolutional Tensor-Train LSTM for Spatio-Temporal Learning
Learning from spatio-temporal data has numerous applications such as human-behavior analysis, object tracking, video compression, and physics simulation. However, existing methods still perform poorly on challenging video tasks such as long-term forecasting. This is because these kinds of challenging tasks require learning long-term spatio-temporal correlations in the video sequence. In this paper, we propose a higher-order convolutional LSTM model that can efficiently learn these correlations, along with a succinct representations of the history. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Convolutional Tensor-Train LSTM for Spatio-temporal Learning
Su, Jiahao, Byeon, Wonmin, Huang, Furong, Kautz, Jan, Anandkumar, Animashree
Higher-order Recurrent Neural Networks (RNNs) are effective for long-term forecasting since such architectures can model higher-order correlations and long-term dynamics more effectively. However, higher-order models are expensive and require exponentially more parameters and operations compared with their first-order counterparts. This problem is particularly pronounced in multidimensional data such as videos. To address this issue, we propose Convolutional Tensor-Train Decomposition (CTTD), a novel tensor decomposition with convolutional operations. With CTTD, we construct Convolutional Tensor-Train LSTM (Conv-TT-LSTM) to capture higher-order space-time correlations in videos. We demonstrate that the proposed model outperforms the conventional (first-order) Convolutional LSTM (ConvLSTM) as well as the state-of-the-art ConvLSTM-based approaches in pixel-level video prediction tasks on Moving-MNIST and KTH action datasets, but with much fewer parameters.